前言
本文使用tensorflow训练多元线性回归模型,并将其与scikit-learn做比较。数据集来自Andrew Ng的网上公开课程Deep Learning
代码
#!/usr/bin/env python
# -*- coding=utf-8 -*-
# @author: 陈水平
# @date: 2016-12-30
# @description: compare multi linear regression of tensor flow to scikit-learn based on data from deep learning cource of Andrew Ng
# @ref: http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=DeepLearning&doc=exercises/ex3/ex3.html
#
import numpy as np
import tensorflow as tf
from sklearn import linear_model
from sklearn import preprocessing
# Read x and y
x_data = np.loadtxt("ex3x.dat").astype(np.float32)
y_data = np.loadtxt("ex3y.dat").astype(np.float32)
# We evaluate the x and y by sklearn to get a sense of the coefficients.
reg = linear_model.LinearRegression()
reg.fit(x_data, y_data)
print "Coefficients of sklearn: K=%s, b=%f" % (reg.coef_, reg.intercept_)
# Now we use tensorflow to get similar results.
# Before we put the x_data into tensorflow, we need to standardize it
# in order to achieve better performance in gradient descent;
# If not standardized, the convergency speed could not be tolearated.
# Reason: If a feature has a variance that is orders of magnitude larger than others,
# it might dominate the objective function
# and make the estimator unable to learn from other features correctly as expected.
scaler = preprocessing.StandardScaler().fit(x_data)
print scaler.mean_, scaler.scale_
x_data_standard = scaler.transform(x_data)
W = tf.Variable(tf.zeros([2, 1]))
b = tf.Variable(tf.zeros([1, 1]))
y = tf.matmul(x_data_standard, W) + b
loss = tf.reduce_mean(tf.square(y - y_data.reshape(-1, 1)))/2
optimizer = tf.train.GradientDescentOptimizer(0.3)
train = optimizer.minimize(loss)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for step in range(100):
sess.run(train)
if step % 10 == 0:
print step, sess.run(W).flatten(), sess.run(b).flatten()
print "Coefficients of tensorflow (input should be standardized): K=%s, b=%s" % (sess.run(W).flatten(), sess.run(b).flatten())
print "Coefficients of tensorflow (raw input): K=%s, b=%s" % (sess.run(W).flatten() / scaler.scale_, sess.run(b).flatten() - np.dot(scaler.mean_ / scaler.scale_, sess.run(W)))
输出如下:
Coefficients of sklearn: K=[ 139.21066284 -8738.02148438], b=89597.927966
[ 2000.6809082 3.17021275] [ 7.86202576e+02 7.52842903e-01]
0 [ 31729.23632812 16412.6484375 ] [ 102123.7890625]
10 [ 97174.78125 5595.25585938] [ 333681.59375]
20 [ 106480.5703125 -3611.31201172] [ 340222.53125]
30 [ 108727.5390625 -5858.10302734] [ 340407.28125]
40 [ 109272.953125 -6403.52148438] [ 340412.5]
50 [ 109405.3515625 -6535.91503906] [ 340412.625]
60 [ 109437.4921875 -6568.05371094] [ 340412.625]
70 [ 109445.296875 -6575.85644531] [ 340412.625]
80 [ 109447.1875 -6577.75097656] [ 340412.625]
90 [ 109447.640625 -6578.20654297] [ 340412.625]
Coefficients of tensorflow (input should be standardized): K=[ 109447.7421875 -6578.31152344], b=[ 340412.625]
Coefficients of tensorflow (raw input): K=[ 139.21061707 -8737.9609375 ], b=[ 89597.78125]
思考
对于梯度下降算法,变量是否标准化很重要。在这个例子中,变量一个是面积,一个是房间数,量级相差很大,如果不归一化,面积在目标函数和梯度中就会占据主导地位,导致收敛极慢。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。